liiiniiiiiiiiiiiiiiiiiniiiiiiiiiiii 

US006381597B1 

(12) United States Patent (lo) Patent No.: us 6,381,597 Bi 

Lin (45) Date of Patent: Apr. 30, 2002 



(54) ELECTRONIC SHOPPING AGENT WHICH 
IS CAPABLE OF OPERATING WITH 
VENDOR SITES WHICH HAVE DISPARATE 
FORMATS 

(75) Inventor: Simon M. lin, Andover, MA (US) 

(73) Assignee: U-Know Software Corporation, 
Woburn, MA (US) 

( * ) Notice: Subject to any disclaimer, the term of this 
patent is extended or adjusted under 35 
U.S.C. 154(b) by 0 days. 

(21) AppL No.: 09/414^77 

(22) Filed: Oct. 7, 1999 



(51) Int. Cl.^ G06F 17/30 

(52) U.S. CI 707/4; 707/3 

(58) Field of Search 707/4, 3, 2, 1, 

707/5, 10, 102, 104; 705/1, 26, 27 

(56) References Cited 

U.S. PATENT DOCUMENTS 

6,119,101 A * 9/2000 Peckover 705/26 

6,185,558 Bl ♦ 2/2001 Bowman et al 707/5 

6,301,584 Bl ♦ 10/2001 Ranger 707/103 

6,304,854 Bl ♦ 10/2001 Harris 705/27 

6,317,718 Bl • 11/2001 Fano 705/1 



* cited by examiner 



Primary Examiner — Sanjiv Shah 

(74) Attorney, Agent, or Firm— Kudirka & Jobse, LLP 

(57) ABSTRACT 

A shopping bot uses real time agents that automatically 
contact disparate web pages representing vast variety of 
different categories and merchants and retrieve and unify the 
information therein for display when a request for the 
information is made. Consequently, there is no need to create 
a "wrapper", or a "information adapter** for each category or 
each merchant because the same agent can retrieve and 
process information in various formats. In particular, the 
shopping bot generates queries from keywords entered by a 
user and a database of URL information. Information 
returned by the queries is filtered, parsed and mapped to a 
standard format. The formatted information can then be 
displayed. Since the information is converted to the standard 
format in real time, the invention allows quick addition of 
online merchants and additional product categories can be 
added quickly and easily. Further, information about a 
product item can be easily enlarged as market needs increase'^ 
without changing a database of codes one-by-one. In accor- 
dance with one embodiment, the database of URL informa- 
tion includes URLs specific to site directories at each 
merchant site so that queries can be easily generated by 
appending user-provided keywords. 

25 Claims, 7 Drawing Sheets 
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ELECTRONIC SHOPPING AGENT WHICH 

IS CAPABLE OF OPERATING WITH 
VENDOR SITES WHICH HAVE DISPARATE 
FORMATS 

FIELD OF THE INVENTION 

This invention relates to electronic shopping agents or 
"bots" which operate over the Internet on behalf of a client 
to locate 00-line vendors which provide goods and services 
of interest to the client. 

BACKGROUND OF THE INVENTION 

The Internet and web-related technology have become 
widespread as personal computers have become more preva- 
lent. One of the fastest growing business sectors is electronic 
commerce, particularly, retail consumer shopping. The Inter- 
net allows consumers to quickly locate goods and services of 
interest to them. In many cases, images of the goods can be 
viewed and orders placed directly over the web. The con- 
sumer may provide payment electronically via credit cards 
and the goods are then shipped to the consumer. Compara- 
tive shopping using the Internet as a search and retrieval tool 
to locate and retrieve information and prices for comparable 
products is also a fast-growing area. There are already 
several comparative shopping tools that are available on the 
Internet, such as Junglee and Jango, for example. These 
shopping tools accept keywords and category information as 
inputs from consumers. The keywords and category infor- 
mation are used to create an autonomous agent or "shopping 
bot" which scans over the Internet and locates related 
products from a set of online merchants. The product items 
that are located are returned by the shopping bot are then 
presented to the consumer using a simple tabular form to 
enable comparison shopping. 

While the existing shopping tools can help users to do 
comparative shopping, there are several limitations among 
all the existing shopping bots. The first limitation is that the 
number of online merchants included in the comparison pool 
that a user can access and use for comparisons is small This 
limitation can be mainly attributed to a historical fact, i.e. the 
evolution of the Internet. The Internet was originally 
designed to operate with information coded in a very spe- 
cific format called HyperText Markup Language (HTML). 
HTML is a presentation language that uses codes embedded 
in the document to define how a particular segment of a 
document is presented on a display mechanism such as a 
Web browser. Although HTML has a predefined and fixed 
format, it does not give any information about the meaning 
or semantics of the information which it is used to format. 
Therefore, although Web browsers can read HTML and use 
the HTML codes to identify selected parts of the 
information, such as text and graphics, the browsers cannot 
use the HTML codes to extract information from the iden- 
tified document parts. In addition, web pages often differ 
drastically depending on the taste, preference, and marketing 
needs of different designers and merchants. 

As a result, it is difficult and tedious for a search engine 
to extract specific information, such as item price, from a 
wide variety of different HTML coded web pages. In order 
to overcome this problem, some shopping bots use mecha- 
nisms such as so-called "v^appers" or "information adapt- 
ers." One of these mechanisms is programmed to discover or 
"learn" about each product category in each merchant site. 
However, these mechanisms are very slow and it usually 
lakes from hours to days to include a new merchant in the 
comparison pool. Furthermore, if the merchant changes its 
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web page formats, it will take also a long time to change the 
mechanism used for that merchant site in order to make the 
search engine continue to work. The continual flux of the 
Internet requires the shopping bot providers to employ many 

5 programmers to design and maintain their services and the 
number of merchant sites covered is necessarily small. 

The second limitation is the amount of information a 
consumer can get through existing shopping bots. Currently, 
most shopping bots provide consumers with only limited 
information such as price, a brief description of items, and 
a merchant link. However, from consumer's point of view, 
price may not be the only criteria on which to base a 
shopping decision. Other factors, such as shipping date, 
warranty information, creditability of a merchant, and 

^5 service, etc. often affect shopping decisions. Although it is 
possible to add additional criteria to existing shopping bots, 
it means changing hundreds or thousands of "wrappers" or 
"information adapters" corresponding to different 
categories, subcategories, and merchant sites. Such a task is 

2Q a very costly investment both in terras of time and human 
resources. 

The third limitation on existing shopping bots is perfor- 
mance and accuracy of searched results. Most existing 
shopping bots are very slow and take minutes to generate 

25 search results. Consequently, many existing systems store 
information retrieved from merchant sites in a local database 
so that searches are greatly accelerated. However, the local 
databases are only periodically updated by contacting the 
merchant sites. Therefore, the results of the search are often 

3Q out-of-date and not very accurate or not related to the actual 
situation at the merchant site. 

Therefore, there is a need for a shopping bot which can 
operate with a variety of different merchant site formats and 
which can quickly adapt to new formats or changes to 

35 existing merchant sites. 

There is further need for a shopping bot which can be 
easily and quickly modified to retrieve and display new and 
different information fi-om that currently being displayed. 
There is a further need for a shopping bot which can 

40 quickly provide accurate and timely information to consum- 
ers. 

SUMMARY OF THE INVENTION 
In accordance with the principles of the invention, a 

45 shopping bot uses real time agents that automatically contact 
disparate web pages representing vast variety of different 
categories and merchants and retrieve and unify the infor- 
mation therein for display when a request for the informa- 
tion is made. Consequently, there is no need to create a 

50 "wrapper**, or a "information adapter" for each category or 
each merchant because the same agent can retrieve and 
process information in various formats. 

In particular, the shopping bot generates queries from 
keywords entered by a user and a database of URL infor- 

55 mation. Information retumed by the queries is filtered, 
parsed and mapped to a standard format. The formatted 
information can then be displayed. Since the information is 
converted to the standard format in real time, the invention 
allows quick addition of online merchants and additional 

60 product categories can be added quickly and easily. Further, 
information about a product item can be easily enlarged as 
market needs increase without changing a database of codes 
one-by-one. 

In accordance with one embodiment, the database of URL 
65 information includes URLs specific to site directories at 
each merchant site so that queries can be easily generated by 
appending user-provided keywords. 
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Id accordance with another embodiment, information 
which is retrieved from merchant sites comprises informa- 
tion which would normally be displayed by a browser. This 
information which is typically in HTML or XML format is 
parsed and filtered and a hierarchical tree structure is used to 5 
map the information to desired categories before displaying 
the information for comparison. 

In accordance with still another embodiment, efiScient 
caching and distributed algorithms are used to reduce con- 
sumer response time. 

BRIEF DESCRIPTION OF THE DRAWINGS 

The above and further advantages of the invention may be 
better understood by referring to the following description in 35 
conjunction with the accompanying drawings in which: 

FIG. 1 is a block schematic diagram of a networking 
arrangement which includes the Internet and connects sev- 
eral local computer systems to remote servers. 

FIG. 2 is a block schematic diagram which illustrates the 20 
major components of the inventive shopping system. 

FIGS. 3Aand 3B, when placed together, form a flowchart 
which illustrates the inventive process of generating queries 
in parallel from stored URLs issuing the queries to merchant 
sites and processing the results in parallel. 

FIG. 4 is a flowchart which illustrates the process of 
extracting relevant information from query results. 

FIG. 5 is a schematic diagram of information in an 
example tree branch having three node levels. 3Q 

FIG. 6 is a schematic diagram of information in another 
example tree branch having three nodes. 

FIG. 7 is a schematic diagram of information in a further 
example tree branch also having three nodes, 

35 

DETAILED DESCRIFHON 

FIG. 1 shows a commonly used network arrangement in 
which local computer systems 100 and 102 are connected by 
a local area network (LAN) 104 to a local server 106 which ^ 
may access a plurality of remote servers 110-114 through 
the Internet 108. Each remote server 110-114 may include 
World Wide Web sites (web sites) that each include a 
plurality of World Wide Web pages (web pages). Each local 
computer system 100 and 102, of which system 100 is 
shown in more detail, may access the remote web sites with 
web browser software 101, such as Netscape Navigator"*"", 
available from Netscape Communications Corporation of 
Mountain View, Calif, or Internet Explorer available from 
Microsoft Corporation, Redmond, Wash. 

A web site has a home page which constitutes the highest 
level in the hierarchy. The home page typically contains 
general information about the merchant, including graphic 
images and may contain other information such as a menu 
allowing a user who visits the web site to navigate to the 55 
other web pages that constitute the site. The site may also 
include a site directory that is a web page that contains links 
to the other web pages. Often a site directory includes a site 
search feature which is an integral search engine that accepts 
user input in the form of keywords and searches the site for go 
matches. Information of use to a consumer, such as item 
descriptions and prices would typically be located on lower 
levels of the hierarchy. Ordering information, such as credit 
card information might be located at still another level of the 
site. 65 

The World Wide Web is actually a collection of servers on 
the Internet 108 that utilize the Hypertext Transfer Protocol 
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(HTTP). HTTP is a known application protocol that pro- 
vides users with access to files (which can be in different 
formats, such as text, graphics, images, sound, and video) 
using a standard page description language known as Hyper- 
text Markup Language (HTML). Among a number of basic 
document formatting frinctions, HTML allows software 
developers to specify graphical pointers on displayed web 
pages (commonly referred to as "hyperlinks") that point to 
other web pages resident on remote servers. Hyperlinks 
commonly are displayed as highlighted text or other graphi- 
cal image on the web page. Selection of a hyperlink with a 
pointing device, such as a computer mouse, causes the local 
computer to download the HTML code of an associated web 
page from a remote server. The location of the web page is 
expressed as a "uniform resource locator" (URL), This 
method provides the remote server with the necessary infor- 
mation to upload the remote web page associated with the 
selected point to the local computer. 

Web sites constructed by on-line merchants contain 
descriptions and or pictures of goods or services for sale. 
Each site is typically arranged in a hierarchical branching 
tree structure having a plurality of nodes that contain one or 
more of the web pages in the site. Each of the nodes in the 
site are considered to be on various levels of each branch in 
the tree structure. For example, a first node is considered to 
be on a lower level than a second node in the same branch 
if a web page in the first node includes the second node in 
its URL. Conversely, a third web node in the same branch is 
considered to be on a higher level than the second node if the 
URL of a web page in the second node includes the third 
node. Web pages are accessed over the Internet, via the 
browser software 101, and commonly are downloaded into 
a cache 103 of the local computer system 100. The browser 
software 101 then uses the HTML code to position the 
various files on a display screen. 

The inventive shopping system is iQustrated in FIG. 2 and 
consists of a search engine kernel (SEK) 206, one or more 
search engines 208-212 and one or more automatic learning 
objects (ALOs) 214-218. The kernel 206 might be located 
in the local server (106, FIG. 1) and interacts with one or 
more users 200-204 to receive a user's request for infor- 
mation and to send the results back to the requesting user. 
Preferably, the SEK 206 is platform independent so that it 
can run on any hardware platform and operating systems. In 
one embodiment, the SEK 206 is written in the Java pro- 
gramming language licensed by Sun Microsystems, Inc. and 
can operate on any platform as long as a Java environment 
is operating on that platform. 

The SEK 206 first analyzes a request generated by a user 
and then starts appropriate functions according to the user's 
request. For example, the SEK 206 might receive a request 
for information relating to books on a particular topic, such 
as "child spousal support" from one of users 200-204. The 
request criteria are entered by the user in a variety of 
manners. The criteria could be entered, via an interactive 
interface in which the user answers a series of questions 
based on keywords. Alternatively, the user might use drop- 
down lists and menus to select a predefined category, such 
as "books" from a list or a graphic display of items. 
Generally, the request would include at least a category, such 
as "book" or "auto" and selected keywords, such as "child", 
"spousal" and "support" or a phrase "child spousal support" 
which can be broken down by a conventional parser into one 
or more keywords. 

In response, the SEK 206 generates one or more queries 
relating to the selected topic. In order to generate these 
queries, the SEK 206 uses an internal database 220 of URLs. 
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This database can include entries set up be participating is formed using the URL which matches the keyword for the 

vendors, or might be a manually downloaded table, or merchant site. In this manner queries are generated for each 

provided by the company which maintains the inventive merchant site having a URL in the selected category. When 

shopping system. The database 220 contains a plurality of each query is generated, the SEK spawns a search engine 

URLs arranged by category. When a category is entered by 5 thread, for example, search engine 208 and provides it with 

a user, the URLs for that category are returned by the the query for a merchant site. The search engine 208 issues 

database 220. the query to the appropriate merchant site. In response, the 

In accordance with an important aspect of the invention, merchant site returns the results of the query. The results 

the URLs stored in database 220 are not the URLs for a from a merchant site are received by the search engine 208 

particular merchant site, but are instead the URLs for the site that issued the query and forwarded to the SEK 206. When 

directory and, in particular, for the site search of each site. results are received, the SEK creates an ALO, for example 

These URLs can be easily combined with the keywords aLO 214 for each merchant site. The ALOs process the 

entered by the user to form a query. This latter query utihzes results to extract relevant information. The extracted infor- 

the internal search engine existing at most merchant sites to mation is returned to the SEK 206 for display formatting, 

perform at least part of the search, thereby relieving the 35 Finally, the formatted results are returned to the one of users 

inventive system of having to construct a query which is 200-204 which made the initial request. The entire process 

specific to each merchant site. For example, continuing the is illustrated in the flowchart shown in FIGS. 3A and 3B, 

above example, if the user has selected the category which, when placed together, form the flowchart, 

"books." The SEK 206 might retrieve the URLs for online The process starts in step 300 and proceeds to step 302 

book vendors such as amazon.com and kingbooks.com from 20 where an attempt is made to retrieve a merchant URL from 

the database 220, However, the URLs actually retrieved the SEK database 220 under the category which has been 

from the database 220 are the URLs for the search engines selected by the user. In step 304, a check is made to 

at these sites: determine whether the attempt was successful. If no addi- 

http://www.amazon.com/exec/obidos/extemal-search/? tional URLs remain and the attempt is not successful, the 

and 25 process proceeds to step 306 and finishes. 

http://www.kingbooks. com/scrip ts/search3.exe?by= Alternatively, if, in step 304, an additional URL is 

keywords& retrieved from database 220, then the process proceeds to 

These URLs are in a form which can readily be combined step 308. In step 308, a query is generated by concatenating 

with the keywords entered by the user in order to form a the URL and the keywords entered by the user to generate 

query such as: 30 a query. As previously mentioned, in many cases, this query 

http://www, amazon.com/exec/obidos/external-search/ will be directed towards a site search engine located at the 

?keyword=child+spousal+support and merchant site. Next, in step 310, a local search engine thread 

http://www.kingbooks.com/scripts/search3. exe?by= is created which issues the query to the corresponding web 

keywords&keywords^child+spousal +support site. The process then proceeds back to step 302 to retrieve 

These queries are advantageous because they use the 35 another URL and generate another search engine thread, 

built-in search engines in the respective web sites to perform The operation of each search thread is illustrated in FIG. 

the actual search, thus reheving the SEK from having to 3B. In particular, the process proceeds, via off-page connec- 

compose a customized search for each site and changing the tors 312 and 316 to step 318 in which the SEK creates an 

customized search when the site changes. automatic learning object to receive the search results from 

However, some merchant sites do not have an internal 40 a merchant site. The query results generated by the merchant 

search engine. Instead, some sites have an on-line catalog site search engine are received by the search engine and 

while other sites are simple web pages. In the case where the forwarded to previously-created ALO as set forth in step 

merchant site has a catalog, it may be necessary for a 320. Data from a site may not be returned all at once, but 

programmer to enter the site and navigate to a section of the processing begins by the ALO as soon as data are received, 

catalog where user selections can be made. The URLs which 45 In step 322, the local search engine which issued the 

correspond to these catalog sections can then often be query then waits to determine whether additional results will 

combined with the user-entered keywords to generate the be provided by the merchant site. If additional results are 

required query. Database 220 may contain several URLs for received, then step 320 is repeated. Alternatively, if there are 

a single merchant site where each URL is mapped to one or no additional results as determined in step 322, the process 

more keywords. When a user enters the keywords for the 50 finishes in step 324. 

query, the keywords are used to select from the URLs for a The SEK 206 keeps track of all ALOs 208-212 generated 

merchant site and then the final query is generated by and performs the necessary synchronization between the 

combining the selected keywords with the user enter in for- ALOs 208-212. Advantageously, the search process is con- 

malion. ducted in parallel with each query being processed by a 

In the case where the merchant site is a simple collection 55 separate search engine thread and the corresponding results 

of web pages, a programmer must enter the site and navigate are processed by a separate ALO. This parallel processing 

directly to a web page which displays an item. The URL greatly reduces the time required to obtain comparative 

which identifies this latter web page is then entered into the results. In addition, the procedure that receives a request 

database 220 and mapped to various keywords selected from message from a user and initiates a search engine is kept 

the web page content. Subsequently, when a user enters 60 very short so that it takes minimum amount of time. Various 

keyword information, the information is used to select URLs ALOs in the SEK share the same resources making inter 

from the database 220 which are mapped to matching process communication more eflScient and eliminating 

keywords. This latter approach is not as advantageous as unnecessary mapping. 

using an internal site search because the mapping must be Based on the category, subcategory, product name, and 

changed if the merchant site is changed. 65 other related information provided by the user, the SEK 206 

As previously mentioned, a keyword for a merchant site initiates one or more search engines 208-212 that examine 

is selected based on the user-selected category and a query all merchant sites that may have the product information 
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requested by the user. Each search engine takes a query need to know how the markup tags are delimited from 

generated by the SEK and goes to the associated web site to normal text and the relationship between the various ele- 

retrieve the desired information. In general, the information ments. For example, in some XML systems, elements and 

retrieved from a web site by the aforementioned queries is their attributes are entered between matched pairs of angle 

intended for display by a browser. Often the information will 5 brackets (< . . . >), while element references start with an 

be encoded using "markup languages" such as HTML or ampersand and end with a semicolon (& . . . ;). In HTML the 

XML or other presentation languages. set of markup tags is fixed and relatively small. In XML 

HTML is a simple "markup language" that is suited for documents, the form and composition of markup tags can be 

the display of small and reasonably simple documents which defined by users, but are often defined by a trade association 

are commonly transmitted on the World Wide Web. Another lO or similar body in order to provide interoperability between 

markup language called the Extensible Markup Language users. XML tag sets are based on the logical structure of the 

(XML) is often used for more complicated documents that document and, consequently, they are easy to read and 

require capabilities beyond those provided by HTML. XML understand. 

is more extensible, allows for validation and defines how XML can represent a greater variety of documents and, 

URLs can be used to identify component parts of XML 15 since different documents have different parts or 

documents, components, it is not practical to predefine tags for all 

HTML and XML documents are composed of a series of elements of all documents. Instead, documents can be clas- 

entities or objects. Each entity can contain one or more sified into "types" which have certain elements. A document 

logical elements and each element can have certain type definition (DTD) indicates which elements to expect in 

attributes or properties that describe the way in which it is 20 a document type and indicates whether each element found 

to be processed. Both languages provide a formal syntax for in the document is not allowed, allowed and required or 

describing the relationships between the entities, elements allowed, but not required. By defining the role of each 

and attributes that make up a document. This syntax tells a document element in a DTD, it is possible to check that each 

computer how to recognize the component parts of each element occurs in a valid place within the document. For 

document. 25 example, an XML DTD allows a check to be made that a 

HTML and XML use paired markup tags to identify third-level heading is not entered without the existence of a 

document components. The markup tags are easily recog- second-level heading. 

nized codes that are added to a document to identify each it would be convenient if the tags in the information 
document component. In particular, the start and end of each returned from the aforementioned queries identified sections 
logical element is clearly identified by entry of a start-tag 30 of the document which were relative to the inventive shop- 
before the element and an end-tag after the element. For ping bot, such as item description, price, etc. However, in 
example, the tags <to> and </to> could be used to identify most cases, the information returned is coded for display on 
the "recipient" element of a document in the following ^ browser and must be processed further to extract the 
manner: desired shopping information. An example of information 
document text . . . <to>Recipient</to> . . . document text. 35 returned from a query is given below. This information is 
The arrangement of tags is hierarchical in that some coded using HTML codes and is intended for use by a 
tagged document portions can contain other tagged docu- browser such as the Netscape or Internet Explorer browsers 
ment portions. In order to operate with a set of tags, users mentioned previously. 



<htinl> 
<head> 

<title> Books Found by Search </titlc> 
<head> 

<body bgcolor-"#FFFFFF'> 
<div align-"ceater'*><center> 

<table border^"0" width-**750"> 

<tr> 

<td width-"375"><font color»"#000000" sizc-"5"><8trong> 
Books Found by Search: <br> 

<br> 

</stroiig>-^/font><font color=**#000000" sizea"3"> 

Your search brought up 1 titles, <br> 

Click on a title for more information. </font><;/td> 

<td valign="top" width«"375"><font size=*'4"> 

<img src="/Lmages/future.gir' width-"20" heig ht="20'*> 
 This ioon represents new and upcoming releases.<br> 

</font>For cuncnl availability info, please click on the title. </td> 

<table border-"0" width»*'750"> 
<tr><td width="750"> 

Your search result is sorted by publication date with most recent one first. 
</td ></tr ></table><;/ce nter ></div > 
<!"ISBN:0944058316"> 

<div align""ccnter"><center> 
<table border="0" width - "750'*> 
<tr> 

<td width-"40"><foat sizc-"4"><strong>l.</strong><yfont><Vtd> 
<td width-"710" colspan="5"><a namc="0370994" 
href-7scripts/detail4xxe?/results/b9bflcb4.html-0370994"> 
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<tr> 
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-continued 



<foiit si2c»**4">How to Settle Child and Spousal Support; With CalSupport 
Software With 3.5 Disk</font><Va></td> 



<td width="40"> </td> 

<td width"" 7 10" colspan="5">AuthQr:sherman, Ed - 

Subject: Domestic Relations - Divorce & Separation - Pub. Date; 

l/1998</td> 



<td width-*' 40"> </td> 

<td width-" 7 10" colspan-"5"> Pub. Price :$29 .95 ~ 

Kingbooks.com Price: 

<font color-"#FF0000">$23.96</font> 

<font co]or='*#000000">-</font>You Save: 

<font colop="#FF0000">$S.99</font></td> 

<ytr> 

<y table><;/centcn> </d iv > 

<div align««"center"><center> 

<img src""imagcs/rcdline.gLF' width»"750" hcighto"4"> 
</ccnter></d iv></body ></html> 



This information must be processed in order to extract the 
relevant information a procedure performed by an ALO, As 
previously mentioned a separate ALO thread is spawned for 
each result set received by the SEK 206 in order to reduce 
processing time. An illustrative processing routine is illus- 
trated in the flowchart shown in FIG. 4. The routine starts in 
step 400 and proceeds to step 402. In step 402 a filtering 
mechanism removes formatting information and attributes. 



This filter can be implemented with a parsing mechanism 
which identifies the tags. Such a parsing mechanism is 
25 well-known for presentation languages, such as HTML and 
XML. Next, the identified lags are compared to a predeter- 
mined tag list and the formatting tags, such as <html>, 
<head>, <tille>, <font>, <br>, etc. are removed. In addition, 
formatting attributes in the tags are also removed. In the case 
of the above example, the remaining information will be: 



<table> <tr> 

<td 

Books Found by Search: '■ 
Your search brought up 1 titles. 
Click on a title for more information. 

</td> 
<td> 

This icon represents new an upcoming releases. 
For current availability info, please dick on the title. 

<I\A> 

<J\r> 

<table> <tr> 

<td> 

Your search result is sorted by publication date with most 
recent one first. 

</tr> 

</table> 
<table> <tr> 

<td> 

1. 

</td> 
<td> 

<a name-"0370994" 

hre f-"/scripts/deta il4.exe?/resu Its/b9bflcb4.html- 
0370994">How to Settle Child and Spousal Support; With 
CalSupport Software With 3.5 Disk 

Vtd> 

</tr> 
<tr> 

<td> 

</td> 
<td> 

Author: Sherman, Ed Subject: Domestic Relations - 
Divorce & Separation Pub. Date; 1/1998 

<l\d> 

</tr> 
<tr> 

<td> 
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^td> 
<td> 

Pub.Pricc:$29.95 ~ Kingbooks.com Price: $23.96 - You 
Save: $5.99 

</td> 

</tT> 

</table> 



Next, as indicated in step 404, the remaining information 
is parsed into a data tree. In the case of HTML and XML, the 
language is naturally hierarchical so that this parsing is 
relatively easy. The example given immediately above splits 
into three separate trees which are comprised of hierarchical 
nodes indicated by the indented sections (the indents were 
added to emphasize the sections.) These trees are illustrated 
in FIGS. 5,6 and 7 and are delineated by the <lable></table> 
tags. 

FIG. 5 illustrates the first tree comprised of a first node 
500 consisting of information contained between the 
<table></table> tags, a second node 502 delineated by the 
<tr></tr> tags and two third nodes, 504 and 506, delineated 
by the <td></td> tags. In a similar manner, FIG. 6 illustrates 
the first tree comprised of a first node 600 consisting of 
information contained between the <table></table> tags, a 
second node 602 delineated by the <tr></tr> tags and a third 
node 604 delineated by the <td></td> tags. FIG. 7 illustrates 
the third tree comprised of a first node 700 consisting of 
information contained between the <table></table> tags, 
three second nodes 702, 704 and 706 delineated by the 
<tr></tr> tags and six third nodes 708-718 delineated by the 
<td></td> tags. 

As set forth in step 406, the nodes in each tree are 
examined to determine whether they contain relevant infor- 
mation. These nodes are examined in sequence, level-by- 
level, as illustrated by the arrows in the figures to detect a 
"complete" node level which contains some or all of the 
desired information. For example in FIG. 5, node 500 is first 
examined. Since it is empty, node 502 at the second level is 
next examined. It is also empty so that nodes 504 and 506 
at the third level are examined. Nodes 504 and 506 contain 
information and this information is examined as discussed 
below. However, since the information contained in nodes 
504 and 506 is not relevant information, such as the title, 
author or price of a book in the example given above, the 
entire tree, including nodes 500 and 502 is removed from 
consideration. If no relevant information is found as deter- 
mined in step 408 (FIG. 4), then the process proceeds back 
to step 406. 

Next, in step 406, the tree illustrated in FIG. 6 is exam- 
ined. Node 600 is first examined. Since it is empty, node 602 
is next examined. It is also empty so that node 604 is 
examined. Node 604 contains information and this informa- 
tion is examined as discussed below. However, since node 
604 does not contain relevant information, the entire tree, 
including nodes 600 and 602 is removed from consideration. 
The process then continues from step 408 back to step 406. 

Next, the tree illustrated in FIG. 7 is examined. Node 700 
is first examined. Since it is empty, nodes 702, 704 and 706 
are examined. These nodes are also empty so that nodes 
708-718 are examined. All of these nodes contain informa- 
tion which is examined. Nodes 710, 714 and 718 contain 
relevant information so that, in step 410, this information is 
extracted by an extraction mechanism and mapped to cor- 
responding buffers. In the above example, the result would 
be: 



rule 


Author 


Dcscription 


Price 


How to Seale Child 


Sherman, 


Domestic Relations - 


$23.96 


and Spousal Support; 


Ed- 


Divorce & Separation - 




With CalSupport 




Pub Date: 1/1998 




Software With 3.5 Disk 




Pub. Price: $29.95 





In step 412, the buffered information is returned to the 
SEK where it is formatted for display on the user's com- 
puter. The process then ends in step 414. In order to extract 
the information in each node, the information is checked 
against a rules set which is specific to the category which is 
being examined. Each rule in the set defines the character of 
one or more fields. For example, one rule might specify that 
a node is complete if it contains all relevant fields. In the 
aforementioned example, these fields are title, author, 
description and price. If one field is missing, another rule 
30 might require further checking to determine whether the 
node is complete and information from that node should be 
extracted. For example, if a price field and a name field are 
found in a node, the node is very likely to contain relevant 
data. 

35 Within each node relevant information is located by 
searching for keywords, symbols or data types which are 
specific to each category. Then, words in the vicinity of these 
keywords could be examined to find relevant information. 
For example, each node may be examined for character 
strings such as "name", "title", "description", "price", or 
"author." If any of these keywords are found, then the 
subsequent characters will be considered relevant informa- 
tion. Another rule might define a price field as the smallest 
number encountered in the node with, or without, a preced- 
ing "$" symbol. A price field may also be defined as a 
number with or without a preceding "PRICE" keyword. 

The rules can be generated in a variety of ways. In one 
embodiment, rules are generated by a programmer for each 
merchant site and maintained by the programmer. In this 
embodiment, the ALO which is processing the received data 

50 will look for a rule in the rule set which matches the data and 
use the rule to extract the data. In another embodiment, the 
search results are parsed to tree nodes as described above 
and the ALO will check each node for keywords which are 
selected from a keyword set which depends on the user- 

55 selected category. The information following these prede- 
termined keywords is then extracted. In this version, sepa- 
rate rules are not needed for each site and the rules do not 
need to be maintained by a programmer 

Alternatively, provision can be made to allow a user to 

60 manually select a block of data in a particular level of a data 
tree so that program can search and extract that block of data 
each time results are returned. Because rules can be written 
for each field, each ALO is very flexible and generic. This 
flexibility aUows different merchants to be added into the 

65 search and comparison pool quickly and easily. It also 
allows different information to be retrieved from a site 
depending on the customer or marketing needs. 
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In an alternative embodiment, the system may be imple- an extraction mechanism which extracts the item infor- 

mented as a computer program product for use with a mation from the relevant information, 

computer system. Such implementation may include a series 5. Apparatus according to claim 1 wherein the retrieved 

of computer instructions fixed either on a tangible medium, information is coded in HTML code and wherein the auto- 
such as a computer readable media (e.g., a diskette, a CD or 5 matic learning object processes the HTML code to remove 

non-volatile storage) or transmittable to a computer system, HTML formatting tags. 

via a modem or other interface device, such as a network. 6. Apparatus according to claim 1 wherein the retrieved 

The series of computer instructions embodies all or part of information is coded in XML code and wherein the auto- 

the functionality previously described herein with respect to matic learning object processes the XML code to remove 

the system. Those skilled in the art should appreciate that XML formatting tags. 

such computer instructions can be written in a number of 7. >^paratus according to claim 1 wherein the database 

programming languages for use with many computer archi- includes at least one URL for a search engine located in one 

lectures or operating systems. Furthermore, such instruc- of the plurahty of merchant sites. 

tions may be stored in any memory device, such as 8. A method for retrieving comparative item information 

semiconductor, magnetic, optical or other memory devices, from a plurahty of merchant sites having disparate infonna- 

and may be transmitted using any communications tion formats in response to a request, including a category 

technology, such as optical, infrared, microwave, or other and a keyword, from a user, the method comprising: 

transmission technologies. It is expected that such a com- (a) constructing a database containing a plurahty of 

puter program product may be distributed as a removable categories and, for each category, at least one URL for 

media with accompanying printed or electronic docuraen- one of the plurality of merchant sites; 
tation (e.g., shrink wrapped software), preloaded with a ^0 (b) Composing a query in response to the request category 

computer system (e.g., on system ROM or fixed disk), or by concatenating a URL obtained from the database 

distributed from a server or electronic bulletin board over with the request category with the request keyword; 

the network (e.g., the Internet or World Wide Web). (c) using a search engine to retrieve information from the 

Although various exemplary embodiments of the inven- plurahty of merchant sites with the query; 
tion have been disclosed, it will be apparent to those skill in ^5 (d) creating an automatic learning object for processing 

the art that various changes and modifications can be made retrieved information to extract the item information; 
that will achieve some of the advantages of the invention 

without departing from the true scope of the invention. (e) wherein step (b) comprises composmg a plurahty of 

These and other obvious modifications are intended to be queries for the requested category and wherein step (d) 

covered by the appended claims. compnses creating a plurahty of automatic learning 

Wh t 1 ■ j1 * ' objects m parallel with one automatic learning object 

na iscaun i^. . . being created to process information retrieved from 

L Apparatus for retrieving comparative item iniormation each ue 

from a plurahty of merchant sites having disparate informa- 9.TmS according to claim 8 wherein step (b) 

tion formats in response to a request, including a category comprises composing a plurality of queries for the requested 

and a keyword, from a user, the apparatus comprismg: category and wherein step (c) comprises using a plurahty of 

a database containing a plurality of categories and, for search engines in parallel with one search engine being used 

each category, at least one URL for one of the plurality for each query. 

of merchant sites; 10. A method according to claim 8 wherein step (d) 

a query generator responsive to the request category for comprises: 

composing a query by concatenating a URL obtained (di) removing formatting information in the retrieved 

from the database with the request category with the information; and 

request keyword wherein the query generator com- (d2) parsing the filtered information into one or more data 

poses a plurahty of queries for the requested category; ^^^^ g^ch data tree having one or more nodes, 

a search engine for retrieving information from the plu- U. A method according to claim 10 wherein step (d) 

rahty of merchant sites with the query; and further comprises: 

an automatic learning object for processing retrieved (dS) examining each node for relevant information; and 

information to extract the item inforrnation and wherein (^4) extracting the item information from the relevant 

a plurahty of automatic learning objects are created in information. 

parallel with automatic leammg object being created to jq 12. A method according to claun 8 wherein the retrieved 

process inforaiation retreived from each query. information is coded in HTML code and wherein step (d) 

2. Apparatus according to claim 1 wherein the query comprises processing the H'mL code to remove H'TML 
generator composes a plurahty of queries for the requested formatting tags. 

category and wherein a plurality of search engines are 13 method according to claim 8 wherein the retrieved 
created in paraUel with one search engine being constructed 55 information is coded in XML code and wherein step (d) 

for each query. comprises processing the XML code to remove XML for- 

3. Apparatus according to claim 1 wherein the automatic matting tags. 

learning object comprises: 14, A method according to claim 8 wherein the database 

a filter for removing formatting information in the is constructed to include at least one URL for a search engine 

retrieved information; and go located in one of the plurality of merchant sites, 

a parser for parsing the filtered information into one or 15. A computer program product for retrieving compara- 

more data trees, each data tree having one or more tive item information from a plurahty of merchant sites 

nodes. having disparate information formats in response to a 

4. Apparatus according to claim 3 wherein the automatic request, including a category and a keyword, from a user, the 
learning object further comprises: 65 computer program product comprising a computer usable 

a mechanism which examines each node for relevant medium having computer readable program code thereon, 

information; and including: 
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program code for constructing a database containing a 
plurality of categories and, for each category, at least 
ooe URL for one of the plurality of merchant sites; 

program code for composing a query in response to the 
request category by concatenating a URL obtained 5 
from the database with the request category with the 
request keyword; 

program code for creating a search engine to retrieve 
information from the plurality of merchant sites with 
the query; 

program code for creating an automatic learning object 
for processing retrieved information to extract the item 
information; and 

wherein the program code for composing a query com- 35 
prises program code for composing a plurality of 
queries for the requested category and wherein the 
program code for creating a search engine comprises 
program code for creating a plurality of automatic 
learning objects in parallel with one automatic learning 20 
object being created to process information retrieved 
from each query. 

16. A computer program product according to claim 15 
wherein the program code for composing a query comprises 
program code for composing a plurality of queries for the 25 
requested category and wherein the program code for cre- 
ating a search engine comprises program code for creating 

a plurahty of search engines in parallel with one search 
engine being used for each query. 

17. A computer program product according to claim 15 30 
wherein the program code for creating an automatic learning 
object comprises: 

program code for removing formatting information in the 

retrieved information; and 
program code for parsing the filtered information into one 

or more data trees, each data tree having one or more 

nodes. 

18. A computer program product according to claim 17 
wherein the program code for creating an automatic learning 
object further comprises: 

program code for examining each node for relevant 

information; and 
program code for extracting the item information from the 

relevant information. 

19. A computer program product according to claim 15 
wherein the retrieved information is coded in HTML code 
and wherein the program code for creating an automatic 
learning object comprises program code for processing the 
HTML code to remove HTML formatting tags. 

20. A computer program product according to claim 15 
wherein the retrieved information is coded in XML code and 
wherein the program code for creating an automatic learning 
object comprises program code for processing the XML 
code to remove XML formatting tags. 
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21. A computer program product according to claim 15 
wherein the database is constructed to include at least one 
URL for a search engine located in one of the plurality of 
merchant sites. 

22. A computer data signal embodied in a carrier wave for 
retrieving comparative item information from a plurahty of 
merchant sites having disparate information formats in 
response to a request, including a category and a keyword, 
from a user, the computer data signal comprising: 

program code for constructing a database containing a 
plurality of catageories and, for each category, at least 
one URL for one of the plurality of merchant sites; 

program code for composing a query in response to the 
request category by concatenating a URL obtained 
from the database with the request category with the 
request keyword; 

program code for creating a search engine to retrieve 
information from the plurality of merchant sites with 
the query; 

program code for creating an automatic learning object 
for processing retrieved information to extract the item 
information; and 

wherein the program code for composing a query com- 
prises program code for composing a plurality of 
queries for the requested category and wherein the 
program code for creating a search engine comprises 
program code for creating a plurality of automatic 
learning objects in parallel with one automatic learning 
object being created to process information retrieved 
from each query. 

23. A computer data signal according to claim 22 wherein 
the program code for composing a query comprises program 
code for composing a plurality of queries for the requested 
category and wherein the program code for creating a search 
engine comprises program code for creating a plurality of 
search engines in parallel with one search engine being used 
for each query. 

24. A computer data signal according to claim 20 wherein 
the program code for creating an automatic learning object 
comprises: 

program code for removing formatting information in the 

retrieved information; and 
program code for parsing the filtered information into one 

or more data trees, each data tree having one or more 

nodes. 

25. A computer program product according to claim 22 
wherein the database is constructed to include at least one 
URL for a search engine located in one of the plurality of 
merchant sites. 

♦ * ♦ * * 
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